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1 Motivation and context 

The purpose of this article is to lay the mathematical foundations of a well known numer- 
ical approach in computational statistical physics, namely the parallel replica dynamics, 
introduced by A.F. Voter in |18j and improved and popularized in the context of Molecular 
Dynamics simulations in [19\ [XU| \T7\ 116] . for example. The aim of the approach is to effi- 
ciently generate a coarse-grained evolution (in terms of state-to-state dynamics) of a given 
stochastic process. The approach formally consists in concurrently considering several re- 
alizations of the stochastic process, and tracking among the realizations that which, the 
soonest, undergoes an important transition. Using specific properties of the dynamics gen- 
erated, a computational speed-up is obtained. In the best cases, this speed-up approaches 
the number of realizations considered. 

By drawing connections with the theory of Markov processes and, in particular, exploit- 
ing the notion of quasi- stationary distribution, we provide a mathematical setting appro- 
priate for assessing theoretically the performance of the approach, and possibly improving 
it. 



1.1 Description of the parallel replica dynamics 

Consider a stochastic dynamics Xt in M. d . In the following, we will focus on the overdamped 
Langevin dynamics: 

dx t = -vv{x t ) dt + ^iiF^dw t , (i) 

driven by a potential energy V : M d — > K. However, the algorithm we study equally applies 
to a Langevin equation or a kinetic Monte-Carlo (KMC) dynamics. 
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Intuitively, the dynamics (Q]) may be seen as the motion of X t in the energy landscape 
denned by the potential V. Such a landscape typically exhibits many wells. The process 
X t progressively discovers and successively explores these wells. The time spent by the 
process in a well before it hops to another one might be computationally prohibitively 
long, thus the need for an alternative numerical approach to the direct simulation of the 
process. The parallel replica dynamics is such an approach (others are discussed in |19|). 

We henceforth assume that V : M. d — > R is a smooth potential, such that f exp(— (3V) < 
oo. We consider an application 

S : R d -> N 

that associates to a given position x a state S(x) in a discrete space, say N. In practice 
and for instance, S maps a position x to the local minimum reached by the gradient 
dynamics (y = — VV(y)) starting from x, and these local minima are numbered (as they 
are discovered as the algorithm proceeds). For simplicity, we may think of S as any discrete 
valued map. The state-to-state dynamics we will henceforth consider as reference dynamics 
is (S(X t )) t > . 

The aim of the parallel replica dynamics is to generate a trajectory (St)t>o which is more 
efficiently computed than, but shares the same law as, the original trajectory (S(Xt))t>o 
obtained from Xt, the solution to (pQ). Two adjustable times, Tdephase anci T corr , will enter 
the definition of the dynamics (St)t>a- In practice, these times may be state-dependent, 
but in many practical situations, they are fixed a priori and taken to be equal. One purpose 
of the analysis below is to provide a theoretical guideline for the choice of these two times, 
and in particular T corr . 



The parallel replica dynamics as implemented in (18] consists of the following flow chart. 
Consider the initial condition X r ^ = Xq for a reference walker (Aj e ^)j>o- Consider the 
associated initial condition for the state dynamics So = S(Xq), and set the simulation 
clock T s i mu to zero. Then iterate on the following steps: 

1. Decorrelation step: Let the reference walker (X™* +t )t>o evolve according to ([T|) 
over a time interval t E [0,r corr ]. Then 



• If the process leaves the well during this time interval, namely if there exists a 
time t < T corr such that S { XJ^* S ( Xl e ^ ) , then 

\ 1 simu / \ J simu J 

— (i) advance the simulation clock by r corr : T s i mu = T s i mu + r corr , and 

— (ii) return to 1. 

• If not, then 

— (i) advance the simulation clock by T corr : T s i mu = T s i mu + T corr , and 

— (ii) proceed to 2. 

During this step, the state dynamics St is defined as: 

St = six^f) 

and is thus exact. 

2. Dephasing step : Replicate the walker X™ into N replicas (in practice ./V typically 
ranges between 10 2 and 10 4 ), that is, for k € {1, . . . , N}, set: 

vk _ Y re f 

± simu J- simu 
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Let these replicas evolve independently (according to ([T]) using independent Brownian 
motions) over a time interval t £ [0, Tdephase] an d using the following rule: if one of 
the replica (say K) leaves the well during the time interval [0, Td e phase]i then this 
particular replica is eliminated and the dephasing step for this particular replica is 
restarted from the initial position (Xft = X^ ). Throughout this step, the 

v -*- simu J- simu 

simulation clock is not advanced, nor are the state dynamics St updated. With a 
slight abuse of notation, we therefore still denote by X k . the positions of the 
replicas at the end of this dephasing step. 

3. Parallel step : Let all the replicas evolve independently and denote by 

= M{t > 0, S(X k Ts _ +t ) * S{X k Tsmu )} 

the first time the k-th. replica leaves the current well (denoted by W = {x, S(x) = 
S(X™ )). Introduce the first observed escape time over all the replicas 

v -*- simu J 7 

T = MT& 

k 

and the index 

K = arginfT^ 

k 

of the replica that first leaves the well. Note that T = T w °. Note also that the 
probability that many replica may leave simultaneously is zero, so that the index Kq 
is well-defined. The parallel step is terminated at the first observed escape event, in 
which case the simulation clock is advanced by NT (at least for synchronized CPUs, 
see the discussion at the end of Section [3]) and the position of the reference walker is 
set to the position of the particular replica that just underwent the transition: 

T ■ — T ■ 4- NT nnrl Y re f — Y K ° 

I stm« — ± simu TJ'i ctnu y\rp _i_/VT — -^rp 

Over the whole time interval [T s i mu ,T s i mu + NT] of length NT, the state dynamics 
St is constant and defined as: 

S t = S(Xl ). 

Return to 1. 



The question we are interested in is to understand the error intrinsically present in the 
numerical approach, namely the difference between the law of (St)t>o as defined by the 
above parallel replica dynamics and the law of (S(Xt))t>o, Xt being the solution to (pQ). 

Besides a formalized answer to the above question (see Proposition [6] below), the main 
outcomes of our work are the following: 

• the aim of the dephasing step of the parallel replica dynamics is to sample the so- 
called quasi- stationary distribution of the well currently visited by the dynamics; 
efficient approaches for completing this goal include the Fleming- Viot algorithm, 
which is a small variation of the approach originally implemented in the dephasing 
step of the algorithm; 

• the parallel replica dynamics can be proven to be, in a mathematical sense, an ap- 
proximation of the original dynamics and this holds irrespective whether this original 
dynamics is metastable or not; of course the efficiency of the approach is improved, 
and the calibration of the adjustable parameters is easier, when metastability occurs 
in a sense made precise below in terms of the spectral properties of a certain operator. 
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• for the parallel replica dynamics to be maximally efficient, the adjustable parameter 
be calibrated in terms of characteristic quantities of the original dynamics 
(see Equation (|22p below) ; the relevant quantities are expressed in terms of eigenval- 
ues of a given operator associated to the dynamics, but can be difficult to practically 
determine for an arbitrary definition of states. 



1.2 A first discussion and an outline of the article 

We first notice that if r corr is chosen infinite, the parallel step is never activated. The 
parallel replica dynamics then amounts to a simple, classical simulation of the dynamics ([I]). 
We are thus interested in situations where T corr is chosen finite. 

For the parallel step not to introduce any error, two essential assumptions are required 
on the replicas obtained after the dephasing step: 

[HI] the initial positions Xj, . f° r ^ ne P ara Uel step are i.i.d. and 

[H2] conditionally on the past FT aimu {J~t being the filtration generated by the Brownian 
motions used in the simulation up to time t), the stopping times Thj are exponentially 
distributed and are independent of the next visited state (for all k, and thus for k = 1 
since we suppose all the X k initially i.i.d.). 

Under these two assumptions (see Section [3] below) , 

(i) NT has the same law as Tyy (where we recall T = inf^ T^) and 

(ii) the next state visited by the replica that is the first to undergo a transition has the 
same law as the next visited state for one single arbitrary replica. 

In other words, under assumptions [HI] and [H2], the parallel step is "exact" in the sense 
that it updates the current state into a new state exactly equal (in law) to the state reached 
when one considers only one replica distributed according to the distribution obtained after 
the dephasing step, and waits for the time for this replica to undergo a transition to a new 
well. In terms of wall-clock time, the speed-up is of order N . This is the evident practical 
interest of the algorithm. 

Note that a motivation for considering [H2] is that a state-to-state dynamics Ut is a 
continuous-time Markov process if and only if it satisfies the following two conditions: 

• the list of visited states denoted by 

(U 1 ,U 2 ,...,U n ,...) 
is a Markov chain (a discrete-time and discrete-space Markov process) and 

• the times successively spent in each state, denoted by 

(Hi,H 2 , H n , . . .), 

(namely U t = U x for t G [0, H x ), U t = Ui for t € + . . . + fli-i,#i + . . . + Hi)), 
are such that (i) the law of Hi given U i is exponential and (ii) conditionally on Ui, 
the time Hi spent in the well and the next well visited Ui+i are independent random 
variables. 
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Thus, if S(Xt) was a Markov process, the algorithm would be exact, and the decorrelation 
step would not be needed. Each cycle of the decorrelation step can thus be seen as a test 
of the Markov character of S(X t ), in that, upon successful decorrelation, the system is 
deemed to be a proper starting point for a subsequent parallel stage. One may see our 
analysis as a way to quantify the error introduced by this assumption. 

After a few remarks on the underlying dynamics in the next section, our work is orga- 
nized as follows. In Section [21 we analyze the dephasing step. Then, Section [3] is devoted 
to the parallel step. Finally, our main result is presented in Section HJ where we analyze 
the error introduced by the decorrelation step. 

1.3 Remarks on the reference dynamics 

1.3.1 Overdamped Langevin and recrossing events 

The algorithm as described above may actually look weird for the continuous-in-time over- 
damped dynamics JT]). Indeed, the first time the process leaves a given state is immediately 
posterior to the time it entered that state (this phenomenon is called recrossing). Thus, 
after a parallel step, the process cannot remain in the new visited state during the first 
correlation time interval: the first decorrelation step is always unsuccessful for such a dy- 
namics. One simple way to overcome this difficulty is to let the reference walker evolve 
for a fixed small amount of time after the parallel step, before proceeding to the next 
decorrelation step. This allows the process to leave the vicinity of the boundary of the new 
visited state. Another way to deal with this difficulty would be to change the decorrelation 
step as: Let the reference walker evolve according to ([1]) over a time t that is the minimum 
time such that there is no transition over [t — T corr ,t]. 

1.3.2 Generalization to other dynamics 

We would like to mention that our analysis carries over to kinetic Monte Carlo models, 
namely for a pure jump Markov process valued in a finite state space. In this case, the 
map S reduces the original discrete state space to a coarser one. 

On the other hand, it is unclear how to generalize our study to a Langevin dynamics: 



since the underlying elliptic degenerate infinitesimal generator causes additional difficulties 
for the spectral analysis. Notice that the algorithm itself however readily applies to such 
a dynamics. 

2 The quasi-stationary distribution and the formalization of 
the dephasing step 

To start with, we discuss here the dephasing step. As mentioned above (see [HI] and 
[H2]), the purpose of this step is to generate independently distributed initial conditions 
for the parallel step, and to complete this according to a distribution such that the escape 
time is exponentially distributed and independent of the next visited state. We now explain 
here how to create, to some extent, an ideal dephasing step that satisfies both conditions 
[HI] and [H2]. The main ingredient of our formalization is the notion of quasi- stationary 
distribution, henceforth abbreviated as QSD. 




(2) 
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2.1 The quasi-stationary distribution 

Consider a state W C M d , and let 

7f = mf{i >0,Xf <?W} 

be the first escape time of W for the stochastic process Xf satisfying ([T]) and starting at 
x £ If at time 0. The state W is in practice a level set of the map S, and we suppose in 
the following that W is fixed, and is a bounded Lipschitz domain of M. d . A quasi-stationary 
distribution v, for the stochastic process Xt and associated to W, is a distribution with 
support in W and such that, for any positive time t and for any measurable set A C W, 

J P(Xf G A, t < Tfr) dv 

v{A) = — j • (3) 

/ F(t<T^)dv 
Jw 

In words, if Xq is distributed according to v, then, conditionally on not having left the 
well W up to time t, X t is still distributed according to v. 

For the convenience of the reader, we collect in this section a few elementary properties 
of the QSD. For more details on the theory, we refer, for example, to [3l [131 021 HI E3 EH 

Let Xt be the stochastic process satisfying (p}. We introduce its infinitesimal generator: 

L = - W • V + /3~ x A, 

and we denote by L* = div (VV-) + /3~ 1 A its adjoint. 

We start by stating a Feynman-Kac formula that will be useful below. 

Proposition 1 Consider a smooth solution v(t,x) to the problem: 

dtv = Lv for t > 0, x G W , 
< v = ip on dW , 

k v(0,x) = v (x), 

where if is a smooth function. Then, 

v(t, x)=E (l T ^ <t p(Xf» )) + E (l T ^> t v (X?)) , 

where Xf is the process starting at x at time and the first exit time from W . 
Proof : Fix a time t and consider u(s, x) = v(t — s, x), which satisfies 

' d s u + Lu = for se[0,t],i£f, 
< u = ip on dW, 

u(t, x) = v (x). 

Using Ito calculus, we see that: Vs 6 [0,T^ A t], 



u(s,X*)=u(0,x)+ (d s u + Lu)(r,X*)dr + V 2 / 3 " 1 / Vu(r, X r x ) dW r 
= u(0,x) + M S , 
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where M s = ^/W^ ^ V«(r, X?) dW r is a local martingale. Since u is assumed to be 
smooth, and X% lives in the bounded domain W up to time T w A t, we conclude: 

v(t,x) = u(0,x)=E(u(tATfr,X& T * r j) 

= E (l T * v<t u(Tf , + E (1^ > t «(t, Xf)) 

= E(l T , <t ^(X|^)) +E(l r ^>^ (Xf)). 

The quasi-stationary distribution is related to spectral properties of the generator L 
supplemented with zero Dirichlet boundary conditions on dW. Let us make this precise. 
We introduce the invariant measure for the dynamics Xt~. 

d/i = Z^ 1 exp(— /3V(x)) dx 

where Z = J* Rd exp(— j3V). It is well known that the dynamics ([1]) is reversible with respect 
to fi: for all smooth test functions / and g, 

[ fLgdfi= [ gLfd f i = -r 1 [ Vf-Vgd^. 

JM. d JR d JR d 

This in turn implies that the dynamics restricted to W is reversible with respect to \i 
restricted to W, that is: for all smooth test functions / and g vanishing on dW, 

[ fLgd^= [ gLfdv = -l3- 1 [ Vf-Vgd^. 
Jw Jw Jw 

Thus, the operator L with Dirichlet boundary conditions on dW is negative-definite and 
symmetric with respect to the scalar product 

(/,5V = / fgdfx. (4) 
Jw 

We denote by L 2 the Hilbert space of functions from W to M. which are square integrable 
with respect to fi, equipped with the scalar product Since V is assumed to be smooth, 
the inverse of the operator L from L 2 to L 2 is compact, and we thus introduce its eigen- 
values ( — Ai, — A2, • • • , — A n , . . .) counted with multiplicity: 

> -Ai > -A 2 > ... > -A n > ... (5) 

and the associated eigenfunctions 

(ux,u 2 , ■ ■ ■ ,Un, ■ ■ •) 

which we assume normalized: J w \u n \ 2 d[i = 1. Note that the kernel of L is reduced to 
(so that Ai > in ([5])). Using the fact that 

. . , r'Ayiv/i' 2 ^ ... 

(where H^q denotes the space of functions such that J |V/| 2 + f 2 dfi < 00 which vanish 
on dW) it follows by a standard argument (if u\ is a minimizer, then \u\ | is also a minimizer) 
that we may always assume that u\ is a signed, say nonnegative function. Using the 
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Harnack inequality, it is again standard to show u\ does not cancel on W . We therefore 
have 

u\ > on W, 

while u\ vanishes on dW . This in turn shows that Ai is non-degenerate (thus A2 > Ai 
in (JSJ)) and that the function U\ is the only signed eigenfunction. For these standard 
arguments, we e.g. refer to [H Section 8.12]). 
We now introduce the probability measure 

dv = - — 7 

Jw u i d V 

on W . It is standard that v is indeed a QSD: 

Proposition 2 The measure v defined by ^ is a QSD, that is, satisfies ([3]). In addition, 
v is the eigenfunction associated with the eigenvalue — Ai for the Fokker-Planck operator L* 
with homogeneous Dirichlet (also known as absorbing) boundary conditions. More precisely, 
if we denote by w = g~ = U\ exp(— (3V)/(Z J w u\dfi) the density of v with respect to the 
Lebesgue measure, we have 

I L*w = —X\w on W, 

n mxr (8) 
I w = on oW . 

The eigenvalue —\\ is the first eigenvalue of L* , and is non-degenerate. 

Proof : To get (J3|, it is sufficient to prove that for any smooth function / vanishing 
on dW: 

f E(f(X?)l t < T * r )dv = [ fdu [ P(t<T^)du. (9) 
Jw Jw Jw 

Denote by v(t,x) = E(/(Xf ) lt<T x ) and v(t,x) = P(i < Tm). It follows from Proposi- 
tion [T] that 

d t v = Lv for t > 0, x £ W, 
v = on dW, 
, v(0,x) = f(x), 

and v satisfies the same equation with initial condition v(0,x) = 1. Thus, we get: 

j t j E(f(Xf) U<r*,)dv = j t j v(t,x) Ul (x) d/i(J uidfji 

Lv(t, x)ui(x) dfi (J u\d[i\ 

v(t,x)Lui(x) dfi (J uid/j^j 

= — Ai J v(t, x)ui(x) dfi uidfi 
= -Ai J E(f(XnU<T^)du. 

This implies, 

J E(f(Xf)l t < T ^)du = j fdueM-^lt) 
which in turn yields ([9]) similarly arguing on v. 
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The relation between the spectrum of the operator L with Dirichlet boundary conditions 
on dW seen as on operator on and the operator L* with absorbing boundary conditions, 
follows the variational equality satisfied by the functions u k : for all test function / € , 

UkLf dp = / Vu k -Vfdp = X k / u k f dp,, 
w Jw Jw 

which shows that (— X k , u k exp(— f3V)) is an eigenvalue / eigenfunction couple for L*. The 
converse is obtained similarly. 

Remark 1 It will follow from Proposition below that there is actually a unique QSD 
on W. We will indeed prove there the (actually exponentially fast) long-time convergence 
to the QSD v for the process Xt conditioned to stay in W , irrespective of the initial distri- 
bution. 

The main proposition of this section is the following: 

Proposition 3 Consider the quasi- stationary distribution u associated to the dynamics (p} 
on Xt, and defined by ([7]). Then, if Xq is distributed following v, the first exit time Tyy 
from W is exponentially distributed and is a random variable independent of the first hitting 
point on dW . 

Proof : Consider, for a smooth test function <p : dW — > R, v solution to: 

dtv = Lv for t > 0, x € W, 
v = ip on dW, 
s v{0,x) = 0. 

We know from Proposition Q] that, for all t > and x € W, 

«(t,s)=E(l^ <t 
Consider now 

f(t)= [ v(t,x)du = E" (l Tw< MX Tw )) , 
Jw 

where the superscript v indicates that the process Xt we consider is assumed to start at 
t = under the quasi-stationary distribution v. Then, we have : 

f'(t)= [ d t v(t,x)du 
Jw 



Lv(t, x)dv 



w 



[ (-W • Vv + p- 1 Av) dv 
Jw 

j (Wiv (Wi/) - /3^Vv ■ Vv) - I VV ■ nv + /3 _1 Vv • nv 
Jw Jaw 

v (div (Wu) + (3- l Av) - /3~ 1 / vVv ■ n 
w Jaw 



j vL*v - ft' 1 j ipVv ■ n 
Jw Jaw 



-Xif + Ai / ip dp. 
law 
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where n denotes the outward normal to W, where 

p{dx) = — Vz/ ■ ndaQw 

Ai 

ctqw denotes the Lebesgue measure on dW, and where, with a slight abuse of notation, 
we denote by v both the probability measure and its density with respect to the Lebesgue 
measure on W, which is proportional to U\{x) exp(— j3V(x)). This yields 

E" (l Tw<t <p(X Tw )) = /(*) = (1 - exp(-Ait)) J ^ dp, 

which concludes the proof, p being then the first hitting point distribution on dW. 

A natural question for practical purposes is whether the fact that the exit time has 
an exponential law implies that the initial condition is distributed according to the QSD. 
Here is a necessary and sufficient condition. 

Proposition 4 Let us assume that Xt is solution to ([I]), with an initial condition Xq 
distributed according to a probability measure p$ with support in the well W and such that: 

( dno\ 2 , ^ 
— — da < oo. 

w\dp J 

Let us assume that the first exit time from W ( denoted by T\y ) is exponentially distributed. 

The initial distribution is necessarily the QSD (p,Q = v) if and only if the eigenvalues of 
the operator L on with zero Dirichlet boundary conditions are non- degenerate (see ([5]) ): 
Vi ^ j, Xi ^ Aj and Vfc > 2, 

\ u k dp ± 0, 
Jw 

where u k denotes the h-th eigenfunction (see ©J. 
Proof : The proof is divided into three steps. 

Step 1: A rewriting of¥(Tw > t) in terms of the eigenvalues and eigenf unctions of L. 

The assumption on T\y is equivalent to the fact that there exists a positive A such that, 
for all time t > 0, 

P(Tw >t) = exp(-At). 

Using the same reasoning as in the proof of Proposition [3l the left hand-side can be 
rewritten as: for all time t > 



F{T w >t) = J v(t,x)p {dx), (10) 

where v satisfies the partial differential equation: 

d t v = Lv for t > 0, x G W, 
v = on dW, 
v(0,x) = 1. 

Using the spectral decomposition of the operator L with homogeneous Dirichlet boundary 
conditions on dW, we have: 



)(t,x) = y^exp(-Afct) ( / u k dp)u k (a 
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This equality holds for example in the functional space C(WL + , LV). Using this decomposi- 



tion in ([10p . one gets 



exp(-At) = ¥(T W > t) = ^exp(-A fc f) u k df^j u k dfi ^j , (11) 

which holds for all time t > 0. Notice that the convergence of the series is normal, for the 
L°°-norm on t, since by Cauchy Schwarz, 



k>l 



u k d/j 



< oo. 



Step 2: One implication. 

Let us assume that the eigenvalues of the operator L are non-degenerate and that 
Vfc > 2, J w u k dfi ^ 0. Thus, in the limit t — > oo, the right hand-side of (fTT|) is equivalent 
to the first term of the series (since Ai is non-degenerate): 



y^exp(-Afct) ( / u k dn) ( / 
k>1 \Jw J \Jv\ 



u k dno ~ exp(-Aii) / uidfi / uidfi 
w J \Jw J \Jw 



Using now (jlip . this implies that 



Ai = A and ( / u\d\x I I / u\dfio ) = 1- 
'w J \Jw 



(12) 



Subtracting exp(— Xt) from both sides of (jlip . and repeating the argument, one gets that 
for all k > 2, 



u k d\i I u k dfi ) = 0. 
W J \JW 



(13) 



which implies that: VA; > 2, 



u k dfi = 0. 



ll 



Thus, -fj& only has a component along the first eigenfunction iii, which implies (using (I12|) ): 

dfx utd/j, 

dfio = — — d/i = — = du. 

dn JwUid/j. 

The initial condition is necessarily the QSD. 

Step 3: The other implication. 

Conversely, let us assume that: 3ko > 2, 

/ u ko dfi = 0. 
JW 

Let us then consider the measure /xq defined by 



dfi 



Ui 



Jw u ^ 



+ eu ko d/j,. 



Clearly, J w dfiQ = 1 and is a non-negative measure for e > sufficiently small (using 
the regularity of the eigenfunctions on W). Thus, is a probability measure which is 
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such that is satisfied, and thus an initial condition for (Q]) which is different from the 
QSD, but such that the exit time is exponentially distributed. 

Likewise, let us assume that one eigenvalue is degenerate: > 2, 

Afe = A fco+ i. 

Let us then consider the measure /j,q defined by 

dfJl ° = ( J ~u[dfi + 6 ( (X/ Uk ° +1 ^ Uk ° ~ (Ivy Uk ° ^) Ufeo+1 ) ) dfJ " 

Again, J w d^o = 1 and /io is a non-negative measure for e > sufficiently small. Thus, fj>o 
is a probability measure which is such that (jlip is satisfied, and thus an initial condition 
for ([1]) which is different from the QSD, but such that the exit time is exponentially 
distributed. <0 

A simple example of a situation where there exists a /co > 2 such that J w Uk dfj, = 
is the following: W = (0, l) d and V = on W (so that /i is simply the Lebesgue measure 
on W). Some eigenfunctions of the Dirichlet laplacian operator on (0, l) d indeed have 
zero mean. In dimension d = 1, one can for example consider the probability measure 
un with density proportional to sin^raj 1_ g sm (27ra;) (which is different from the QSD 

J sin(7nr) da; 

■p sm |"| - dx) to obtain exponentially distributed exit times from (0, 1). 
2.2 Formalization of the dephasing step 

As stated in Proposition [3l the crucial property of the QSD v is that if the process starts 
under v, then the exit time from W is exponentially distributed, and the hitting point on 
dW is independent from the exit time. The ideal dephasing step would therefore ensure 
that the replicas are independent and all share the QSD as law. Then, [HI] and [H2] would 
be fulfilled and the parallel step would be exact, as made precise below in Proposition [5j 

The actual dephasing step, as implemented in the current version of the algorithm, can 
thus be interpreted as an approximation procedure for the QSD of the well. It is conse- 
quently interesting to point out that basically two techniques are known in the literature 
to sample the QSD v. One method (called the Fleming- Viot method [2J El HI]) consists 
in launching a set of replicas in W, and when one of them leaves the well, to duplicate one 
of the other replicas. Then, one lets the number of replicas and the time go to infinity. In 
this limit, a finite fixed subset of replicas is i.i.d. with law the QSD. This method is very 
close to what is performed during the dephasing step in the original version of the algo- 
rithm presented in the introduction. The only slight modification is that the Fleming- Viot 
method consists in duplicating a replica when one leaves the well, rather than starting 
again the whole procedure from a fixed initial position. 

Another approach consists in considering only one walker in the well, and each time 
this walker leaves the well, redistribute it according to the empirical measure within the 
well up to the exit time. Again, one has to consider the distribution in the long-time limit 
to get the QSD, see PQ. This somehow justifies the intuition used in the decorrelation step 
that, if the process remains for a very long time in a well, it will be distributed according 
to the QSD, see Section H] below. 

In summary, it is reasonable, with a view to globally analyze the parallel replica dy- 
namics, to first replace the dephasing step by an ideal dephasing step, which consists in 
instantaneously drawing N initial positions for the replicas, independently and according 
to the QSD. The issue of generating that particular distribution, either using a dedicated 
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approach, or precisely using the dephasing step (as currently implemented), is a separate 
issue from analyzing the error introduced by the parallel replica dynamics. 



3 Analysis of the parallel step 

Our analysis of the parallel step is formalized in the following proposition, which shows 
that the parallel step does not introduce any additional error if the assumptions [HI] and 
[H2] are satisfied. 

Proposition 5 Consider N i.i.d. stochastic process , their escape times 

T^ = mf{t>0,X^W} 
from a bounded domain W, and the first escape time over all processes 

T = T*° where K = arg _ min _ 



• Assume that 
Then 

• Assume that 



ke{i,...,N} 
Tyy is exponentially distributed. 
NT has the same law as T^r. 

Tyy is independent of the first hitting point on dW . 



Then the first hitting point for on dW has the same distribution as the first 

hitting point for X\ and is independent of T^° . 

Proof : The first statement is standard. If is exponentially distributed, then 

<p(t) = P(T^ > t) = exp(-At), 

where A is the parameter of the exponential distribution of T^. Thus, considering T = 
min fe6{1) ... iJV} T^, we have 

F(T > t) = P ( min Tt > t 
\ke{i,...,N} 

= p(vfc€{l,...,JV}, T^>t) 

N 

= n 

k=l 

= exp(-NXt). 

This shows that T is exponentially distributed, with parameter iVA. Consequently, NT is 
exponentially distributed with parameter A. 

For the second assertion, the assumption can be written as: for all test functions / : 
8W->R, 
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where ip(t) = P(r^ > t) and p is the first hitting point distribution, with support on dW. 
Then, we have 



N 



12 E \f ( X I*) H>* II l T l w >T^ 
k=l \ l+k 



N 



k=l 

n([ fdp) r&is^-H-v'mds 

\JdW J Jt 



fdp)W)] 

<>\Y J 

-K 



N 



This shows that the first hitting point on dW for X t is distributed according to p, and 
-Wo- 

Three remarks are in order. 



is independent of TM ■ 



First, in the first assertion of Proposition [5] the fact that NT has the same law as Tyy 
is actually equivalent to Tyy being exponentially distributed. The former assertion indeed 
implies the functional equation: Vi > and ViV G N 

[<p(t/N)] N = <p(t), 

where cp(t) = P(T^ > t), the only solution to which is the exponential function. 

Second, without the assumption made in the second assertion, the first hitting point on 
dW for cannot generically have the same distribution as for X\. Indeed, if the first 
hitting point on dW and the exit time from W are coupled, the distribution of X £ would 

favor points on the boundary attained in shorter times, compared to the distribution of 
XL . This issue is a separate issue from that of having or not an exponential distribution 

for the exit time. The fact that the first hitting point distribution on dW for X^° is the 
same as for X\ implies that the next visited state is the same for the two processes. 



Finally and as shown by A.F. Voter in the original article |18j . the algorithm does not 
require synchronized processors to be used in practice, as would suggest the schematic 
presentation of the parallel step we give above. The parallel step above indeed assumes 
that the processors as synchronized, since T and Kq are defined in terms of the first 
replica which leaves the well, considering the same physical time unit for all replicas. If 
the processors are not synchronized, the parallel step is still exact by considering the first 
observed replica which leaves the well, and by advancing the simulation time by the sum 
of the physical times elapsed on each processor, instead of NT^° . We now justify this. 

Assume that, for n S {2, . . . ,N}, the n-th processor is p n times as fast as the first 
one. Then, (which is the time needed for the i-th replica, run on the i-th processor, 
to leave the well W) is exponentially distributed with parameter p{\. Then, consider 
r = mini< n <Ar the random time associated to the first detected event. One can check 
that (1 + p2 + . . . + Pn) T has the same law as Ty^. This means that advancing the simulation 
time by the sum of the (physical times) counted on each processor at the end of the parallel 
step is a correct approach. 
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This reasoning generalizes to non-constant in time processor speeds. Assume for sim- 
plicity that we have N = 2 processors, and that the speed of the second processor, com- 
pared to the first, is P2{t) (where P2 is deterministic and with values in (0, +00), and t is 
in the time- unit of the first processor), t w is exponentially distributed with parameter A: 
P( r ty — = exp(— At). The time t is measured in time-unit of the first processor. So, when 
the time is t on the first processor, the time is R2{t) = Jo P%( s ) ^ s 011 ^ e secon d processor. 
Thus, in the time-unit of the first processor, (which is the first time, in the time-unit of 
the first processor, an event occur on the second processor) is the image by R2 of an expo- 
nential law with parameter A : P(t^ > t) = exp(— XR 2 (t)). Thus r/L is not exponentially 
distributed anymore. Consider however r = rmn(r w , Tyy)- We have P(t < t) = P(t^ < 
*) F ( r w ^ *) = exp(-A(t + i? 2 (*))), so that r has density A(l + p 2 (t)) exp(-A(t + -R 2 (*)))• 
When an event occurs, one looks at the sum of the time actually spent on each processor, 
which is t + R 2 (t). And the law of t + R 2 (t) is indeed exponential with parameter A since 
E(/(r+ J R 2 (r))) = / f(u+R 2 {u))X{l+p 2 {u)) exp(-X(u+R 2 {u)) du = J f(z)Xexp(-Xz)dz. 

4 Analysis of the decorrelation step 

The dephasing step has now been formally replaced by independent draws according to 
the QSD, and we have formalized the parallel step. It now remains to analyze the error 
introduced, at the end of the decorrelation step, by the replacement of X T t e ^ by a random 
position distributed according to the QSD. Intuitively, it is expected that this instantaneous 
draw could at least be justified if there exists a timescale separation: when the process 
enters a new well, and if this new well is indeed a metastable region for the dynamics, then 
the process remains in the well sufficiently long to reach the quasi-stationary distribution 
of that well, before hopping to another well. It is the purpose of the decorrelation step 
to check that the process indeed remains in the well for a sufficiently long time. For the 
decorrelation step to be successful, we thus need the actual typical time to reach the QSD 
to be much smaller than the typical time to hop to another well. In this picture, T corr is 
seen as an approximation of the time to reach the QSD. The purpose of this section is to 
justify this rigorously. 

The decorrelation step is essentially a step that may be seen as a way to control the 
error associated to the instantaneous redrawing according to the QSD in the new state. 
This redrawing is only considered legitimate if the decorrelation step has been successful, 
that is, the process has spent a sufficiently long time in the current well. Any method that 
provides a control of this error would be an equally interesting "decorrelation step." 

We again consider Xf solution to JT]) with initial condition Xq £ W, where W (the 
well) is a bounded domain, subset of the state space. We denote by po the (arbitrary) 
distribution of Xq. We consider the process in the current well, and the joint distribution 
of the first hitting point on the boundary of the well and the first exit time 

T w = M{t > 0,X t gW}, 

when this point is hit. We first derive from the Markov character of (Xt)t>o a useful 
formula: 

Lemma 1 We have, for all ( deterministic ) times t and for all test functions f : x W — > 
R, 

E(f(T w -t,X Tw )\T w >t)= ! E(f(T^,X^))C(X t \T w > t)(dx), 

where Xf and T w respectively denote the process solution to ([1]) with initial condition x, 
and its associated first exit time from W . In the right-hand side, £(X t \Ty/ > t)(dx) denotes 
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the distribution of X t conditionally on T\y > t. Otherwise stated, 

E(f(T w - t,X Tw )\T w >t) = E(F(X t )\T w > t) 

where 

F(x)=E(f(Tfr,Xfy)). (14) 
Proof : This is equivalent to prove that 

E(f(T w -t,X Tw )l Tw > t ) = E(F(X t )l Tw > t ). 

The result is then obtained by conditioning by Tt (where T% is again the filtration generated 
by the Brownian motions used in the simulation up to time t) and using the Markov 
property: 

E(f(T w - t,X Tw )l Tw > t ) = E[E(f(T w - t,X Tw )l Tw > t \T t )} 

= E[E(f(T w - t,X Tw )\F t )l Tw >t] 

and E(f(T w - t,X Tw )\? t ) = F(X t ). 

Our purpose is now to estimate is the difference in law between the following two 
processes: the original process Xt considered above (starting from the arbitrary initial 
condition Xq), given that it has spent a sufficiently long time (say t) in the current well, 
and a similar process starting from the QSD v defined in ([7]) as initial distribution. We 
wish to estimate this difference in the limit t — > oo (which will then, in practice, be replaced 
by t > T corr ). 

We introduce the error 

e(t) = \E(f(T w -t,X Tw )\T w > t)-E v (f{T w ,X Tw ))\ (15) 

where we recall that the superscript v means that, in the second term (only!), the process 
Xt starts at time under the quasi-stationary distribution v introduced in ([7]). 



Before we state our main result on this error, we notice that, with F defined by (I14p . 
we have 



E{F{X t )\T w >t) 



v(t,x) dfio 
w 

v(t, x) d[io 
w 



since, we recall, (Iq denotes the law of Xq, 

v(t,x)=E(l T ^> t F(Xn) 

and 

v(t,x)=E(l T ^> t )=F(Tfr>t). 

By denoting again L the infinitesimal generator of (Xt)t>o, we know from Proposition [T] 
that 

dtv = Lv for t > 0, x G W, 
v = on dW , 
{v(0,x)=F(x), 

and v satisfies the same equation with initial condition v(0, x) = 1. 
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From the spectral decomposition of the operator L introduced in Section \2. 11 we there- 
fore get the following expressions for v and v: 

v(t,x) = V^exp(-Afct) ( / Fu k dfi) u k (x) 
fc>l ^ Jw J 

and 

u k (x) 



i{t,x) = y^exp(-Afct) ( / Ukd/j,) 



Thus, using the definition (J7|) of the QSD v. we have 

v(t, x) dfj,Q 



E(F(X t )\T w > t) 



w 



v(t, x) dfJLQ 

w 



V"exp(-A fe t) / Fu k d[i / u k dno 
^[ Jw Jw 

Y^exp(-Afct) / u k d[i u k d/i 
k>l Jw Jw 

/ mdfj, Fdv I u\ dfio + V,exp(— (A& - Ai)t) / Fu k d[i / u k d/j, 
Jw Jw Jw k>2 Jw Jw 

/ uidfj, m rijUp + y^exp(-(A fc - Ai)t) / u k dfj, u k d^ 
Jw Jw k>2 Jw Jw 

f FAv +Eexp(-(A t - A,)«) f y ^ f "f" 



1 + > exp(-(A fc - AijiH — T — 

Since m > 0, we note f w u\d^Q > and J w u\ d\i > 0. 

We are now in position to state the main result of this section: 

Proposition 6 Assume that the initial arbitrary distribution of Xq admits a Radon- 

Nikodym derivative — with respect to the invariant measure a of the dynamics Xt, such 
d/j, 

that 

[ (%°)\<oo. (17) 
J w \dfj,J 

Then, there exists a constant C (which depends on (jlq but not on f) such that, for all 
t > \^L\ 1 ; the error e(t) defined in (|15p satisfies 

e{t) <C\\f\\ L ~ exp(-(A 2 - Ai)i), 

where — A 2 < — Ai < are the first two eigenvalues of the operator L on the weighted space 

j 2 
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Proof : In order to evaluate (|15p . we first write, using Lemma [TJ 



e(t) = E(f(T w - t,X Tw )\T w > t) - E"(f(T w ,X Tw )) 



E(f(T w ,X^))C(X t \T w >t)(dx)- / E(f(T w ,X^))dv 



E(F(Xi)|2V >t)- f Fdv 
Jw 



where F is denned by (|14l) and the first term in the right-hand side has just been expressed 
in (fT6j). 

We therefore have: 



e(f) 



/ Fdi/ + ^exp(-(A fc -Ai)i)^ 

fe>2 JW 



J w Fu k d\i f w u k dfip 
ui d\i f w ui d[i 



l + ^exp(-(A fe -Ai)t 

k>2 

^exp(-(A fc - Ai)t 



f w u k dfx f w u k d^o 



Fdv 



/ w Fn fc d/i - / w Fdz; J w Ufc d/x J w n fc d/x 



k>2 



Jw U ± d ^ 



Jw u ± d Vo 



l + ^exp(-(A fc -Ai)t 



k>2 



J w u k dji J w u k dno 
Jw "i d ^ ni 



Thus 



e(t) < exp(-(A 2 - Ai)t) 



E 

fc>2 



\J w Fu k d/i J w u k d^o\ + Jw u ^ d ^ Jw u k d M 



Jw u i d A* d Vo 



l + ^exp(-(A fc -Ai)t) 



J w u& dii J w u k djio 



k>2 



J W Ul d ^ Jw Ul d ^ 
Now, we have by Cauchy-Schwarz and using the fact that ||.F||l°° < 



(18) 



E 

k>2 



IV 



Fu k dji I u k dfx. 



iv 



< 



E 

k>2 



Fu k d\x 



w 



\ 



E 

k>2 



duo, 
Uk -i—d/j, 
w dp, 



< 



F 2 d/i\ 



iv 



ir 



dfxo 



dfjL 



11' 



d/x 
dfi 



dfj, 



(19) 



and, likewise, 

E 



k>2 



IV 



Fdv I u k dfj, u k dfi 



IV 



< 



w 



dfi 



dfi. (20) 
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Arguing similarly on the denominator of (|18p . we obtain 



^exp(-(A fc - Ai)t) 



f w u k d\i j u k duo 



k>2 



f w ui d/i J ui duo 



< 



< 



exp(-(A 2 - \i)t) 
I w uidnJ Ul dno 



u k dfi 



exp(-(A 2 - Xi)t) 
Iw u i d ^ fw n i d ^c 




so that this quantity is smaller than 1/2 when t > , where C is a sufficiently large 

constant independent of /. Respectively inserting the inequalities (|19p - (|20p and the in- 
equality (j2~Tj) at the numerator and the denominator of (|18p . we obtain that for t > 



e(t) < 4exp(-(A 2 - Ai)t)vV(W0 
which concludes the proof of Proposition [6J 




Ai ' 



<> 



Note that the assumption ([17p on the initial condition no is not restrictive. For the 
conditioned diffusion process, the time evolution of the density is regularizing. Therefore, 
if (|17p is not satisfied at initial time, the density after a positive time to > does satisfy 
the condition, and we may argue with that density instead of the initial density in the 
proof of Proposition [6j 

Proposition [6] provides an error bound, in total variation norm, on the joint distribution 
of the exit time from W and the first hitting point on dW: for t > j^r^, 



sup 

/,II/IU*><1 



E(f(T w - t,X Tw )\T w >t)- E»(f(T w ,X Tw )) < C7exp(-(A 2 - Ai)t). 



This proposition shows that the correlation time r c , 

C 



should be chosen such that 



A 2 — Ai 

where C is such that C exp(— (A 2 — Ai)t) is small, so that the dephasing step and parallel 
step, which involve replicas initially distributed according to the QSD v do not introduce a 
large error in terms of the joint distribution of the exit time from the current state and the 
next visited state. Notice that one gets a conservative lower bound by taking Ai = 0, and 
that A 2 may be approximated in practice using an harmonic assumption (namely if V is 
close to a quadratic function in the well W). Within such an approximation, the analysis 
is also relevant for the Langevin dynamics ([2]). 

Note that — is the mean time to leave the well W, if the process starts from the QSD. 
Ai 

More generally, the mean time to leave the well W is given by 
E(2V) 



/oo 
P(T W > t) dt 

f'OO f 

/ / v(t, x) dfio dt 

f'OO r 

j Vexp(-Afct) / 
Jo k > ± Jm 



Uk dfi / Uk d^o dt 



w 



k>l 



Ukd[i \ u k du . 
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In order for the algorithm to be efficient, we therefore typically need that 



< Tcorr <E(T W ), (22) 

A2 — Ai 

so that, during the decorrelation step, the process reaches the QSD with good approxi- 
mation before leaving the well. The pending (and difficult) question is to make the above 
estimate more explicit, and therefore practically useful. Explicitly evaluating A2 — Ai is a 
question on its own. Considering more specific situations (metastable well in the limit of 
a small parameter, simple 2d periodic examples, ...) could help for this purpose. 
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